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I. INTRODUCTION 

Determining the optimal fault-tolerant compilation, or 
decomposition, of a quantum gate is critical for designing 
a quantum computer. Decomposition of single-qubit uni- 
tary gates into the {H, T} basis has been well studied in 
recent years. However, there have been few studies of de- 
composing into alternative bases, which may offer signif- 
icant improvements in circuit depth or resource cost. In 
this work, we consider the task of decomposing a single- 
qubit unitary gate into a sequence of gates drawn from 
the V basis, first introduced in Refs. ^ and [5] Histori- 
cally, this basis was the first shown to be efficiently uni- 
versal, in that the length of the decomposition sequence 
is guaranteed to be of depth 0(log(l/e)) [3J, however the 
proof did not offer a constructive algorithm. Recently, it 
has been shown that {H, T} is also efficiently universal 
[I], [5], and the proofs are constructive. In this work, we 
show that despite recent advances for the {H, T} basis, 
the V basis allows for significantly shorter decomposition 
circuits. 

We present two algorithms for compilation into the 
V basis. The first algorithm approximates single-qubit 
unitaries over the set consisting of the V basis and the 
Clifford group; the second approximates over the set con- 
sisting of the V basis and the Pauli gates. The first 
algorithm runs in expected polynomial time and deliv- 
ers e-approximations with circuit depth < 12 log 5 (2/e). 
The second algorithm produces e-approximations with 
circuit depth < 31og 5 (l/e) for most single-qubit uni- 
taries, and approximations of circuit depth 41og 5 (2/e) 
for edge cases. The compilation time is linear in 1/e and 
thus exponential in log(l/e), however, in practice we find 
extremely short circuits (of length L = 28) at precision 
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level e = 3 * 1CP 7 with merely 1 minute of classical CPU 
time and modest space usage. 

This work presents yet another alternative to Solovay- 
Kitaev decomposition, and produces circuits with lengths 
matching the proven lower bound of 0(log(l/e)) [3J. We 
note that our motivation for studying decomposition into 
the V basis stems from two sources. The first was the 
proof in Ref. [3] that it is efficiently universal. The second 
was the recent protocol for distillation of non-stabilizer 
states [B], which gives the first known fault-tolerant im- 
plementation of one of the V basis gates using only magic 
states, Clifford operations, and measurements. 



II. RELATED WORK 

Recently, dramatic improvements have been achieved 
in quantum circuit compilation, in particular in the area 
of single-qubit decomposition. We highlight four devel- 
opments that are particularly relevant for interpreting 
our work in a more general context. 

The Programmable Ancilla Rotation (PAR) method 
for implementing arbitrary single-qubit rotations by re- 
source state teleportation [7] underlines the tradeoff of 
performing an approximating circuit directly on the tar- 
get logical qubit versus on resource ancilla states followed 
by a teleportation protocol to interact with the target 
qubit. An advantage of the method is that ancilla fac- 
tories can be employed which prepare resource states for 
later use, in exchange for performing a probabilistic cir- 
cuit on the target qubit which may require several at- 
tempts prior to success. The actual cost of approximat- 
ing a single-qubit unitary with this method is measured 
in terms of the number of resource states and the number 
of attempts required for success. 

More recently, a technique for distilling non-stabilizer 
states was introduced in Ref. E] and shown to enable ap- 
proximation of any single-qubit unitary. This protocol 
also uses state teleportation and can achieve on average 
constant circuit depth. A key consequence of this work 
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is the ability to prepare a state that enables the fault- 
tolerant implementation of the V basis gates. 

Recent research on the characterization of (H, T) cir- 
cuits [5J IHj has lead to a seminal decomposition result: 
a constructive algorithm for efficient ancilla-free compi- 
lation of a given single-qubit unitary into the (H, T) ba- 
sis, with a corresponding T-count guarantee of the form 
41og 2 (l/e) + 11 for Z rotations and 121og 2 (l/e) + K, 
where K ~ 33 for general unitaries |4 . Further im- 
provements to this algorithm were shown in Ref . |5 which 
presents a less efficient compilation method that produces 
shorter ancilla-free approximation circuits with an ex- 
pected T count of 9.631og 2 (l/e) - 20.79. 

Our direct search algorithm (Section [v} produces e- 
approximation circuits with a V count of 31og 5 (l/e) in 
most cases and 41og 5 (2/e) in edge cases. If a fault- 
tolerant V gate has the same cost as a fault-tolerant 
T gate, then this algorithm gives state-of-the-art cir- 
cuit depth asymptotics. Figure [T] plots the T count (V 
count jq of the approximation circuits versus the preci- 
sion e for several state-of-the-art {H, T}-based methods 
and the F-based algorithms presented in Sections |IV| and 
|V} The solid blue curve plots the theoretical bound for 
the algorithm given in Ref. [U The dashed red curve is 
based on interpolation of the experimental results given 
in Ref. 5] The dashed green curve plots the theoretical 
bound (matched by experimental data) for decomposi- 
tion into the V basis using our randomized algorithm 
(Section |IV[ ). The double black curve plots the average 
experimental results over 1000 random unitaries from de- 
composition into the V basis using our direct search al- 
gorithm (Section [Vj. 

From this plot, we see that the T count is substantially 
lower for a given precision when compiling into the V ba- 
sis. These curves serve as evidence of the potential im- 
provements in circuit decomposition by considering other 
bases, and hopefully motivates research in determining 
an optimal and low-cost fault-tolerant implementation of 
a V gate. To the best of our knowledge, there is not yet 
a fault-tolerant implementation of the V gate that has 
cost equal to that of the T gate. 

One possible exact implementation [6] requires on aver- 
age a constant depth of 3 per V gate, but in turn requires 
an "offline" cost in T gates and is probabilistic. If the 
protocol succeeds (which only occurs half of the time), 
then the cost per V gate is only 5.35 T gates, making 
the algorithm competitive with [3] and [S]. However, if 
the protocol fails, then the cost increases. Details on this 
implementation of a V gate are given in Appendix [A] 
In order for decomposition into the V basis to be com- 
petitive with state-of-the-art (H, T) decomposition, it is 
necessary to determine an exact, fault-tolerant V gate 
implementation with a cost less than the cost of 6 T 



1 For illustrative purposes, here we assume one T gate has the 
same cost as one V gate. 
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FIG. 1. T count (V count) versus precision e for state-of-the- 
art single-qubit decomposition methods. Algorithm in Ref. A 
(solid blue curve), algorithm in Ref . [5] (dashed red curve), de- 
composition into the V basis using the randomized algorithm 
( Section |IV| dashed red curve), decomposition into the V ba- 
sis using the direct search algorithm ( Section [V] dashed green 
curve). A T gate is assumed to have equal cost to a V gate. 



gates. We proceed by describing the two algorithms for 
compiling into the V basis. 



III. DEFINITIONS AND KEY THEOREMS 

The efficiently universal single-qubit unitary basis in- 
troduced in Refs.Q]and[2]and further developed in Ref. 3 
consists of the following six special unitaries: 

V 1 = (I + 2iX)/V5, Vf 1 = (I-2iX)/V5, 
V 2 = (I + 2iY)/V5, FT 1 = (I-2iY)/VE, 
V 3 = {I + 2iZ)/VE, Vf 1 = (I-2iZ)/VE. 

We call this basis the V basis. 

The subgroup (V) C SU(2) generated by this basis is 
everywhere dense in SU (2) and thus {Vi, V" 1 , i = 1, 2, 3} 
is a universal basis. 

Let the set of W circuits be the set of those circuits 
generated by this basis and the Pauli matrices /, X, Y, Z. 

It is important to note that the monoid (W) = 
(X, Y, Z, Vi, V 2 ,V 3 ) C SU(2) contains all of the {Vr x ,i = 
1, 2, 3} and thus is in fact a subgroup of SU{2) containing 
(V)- 

W circuits constitute a slight liberalization of the ap- 
proach in Ref. |3J where only circuits in the V basis are 
considered. Our justification for the liberalization is that 
the Pauli operators are a staple of any quantum com- 
puting architecture and can be implemented fault tol- 
erantly at a very low resource cost in comparison to a 
non-Clifford group gate. 
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It is also noted that the single-qubit Clifford group C 
in combination with any of the six V matrices generates 
a monoid (C + V) C SU (2) that is in fact a group, con- 
taining (W). 

We call the number of V gates the V count of a circuit 
and denote it as V c . It is easy to show that an irreducible 
W circuit contains at most one non-identity Pauli gate. 
Thus, if v is the V count of such a circuit, then the overall 
depth of the circuit is either v or v + 1. 

Throughout, we use trace distance to measure the dis- 
tance between two unitaries U, V € PSU{2): 

distOJ, V) = sjl- |tr(Wt|/2, (1) 

and call the distance between a target unitary and the 
approximating unitary the precision e. 

According to [3], any single-qubit unitary can be ap- 
proximated to a given precision e by a V circuit of 
depth 0(log(i)), however the proof in Ref. 3 is non- 
constructive, and no algorithm for actual synthesis of 
the approximating circuits has yet been shown. Here 
we develop effective solutions for synthesizing M^-circuit 
approximations of single-qubit unitaries. 

Our solutions are based on the following theorem: 

Theorem 1. A single-qubit unitary gate U can be exactly 
represented as aW circuit ofV count V c < L if and only 
if it has the form 

U = (al + biX + ciY + diZ)5~ L/2 , (2) 

where a, b, c, d are integers such that a 2 +b 2 +c 2 +d 2 = 5 L . 

Thm [T] follows from Thm [2] given below, which also 
gives rise to a simple constructive procedure for synthe- 
sizing a W circuit that represents such a U. 

We begin by sketching a linear-time subalgorithm for 
exact Vy-circuit synthesis that employs arithmetic of Lip- 
schitz quaternions jTTJKTT] . More specifically, consider the 
group W of quaternions generated by 

± 1, ±i, ±j, ±k, 1 ± 2i, 1 ± 2j, 1 ± 2k. (3) 

Then the following holds: 

Theorem 2. (1) W is equal to the set of Lipschitz 
quaternions with norms 5 , (Z G Z, Z > 0). (2) Con- 
sider the group W\ — {w / \j 'norm(w)\w G W}. Then the 
subgroup of gates in PSU (2) representable as exact W- 
circuits is isomorphic to the central quotient W\/ Z(W\) 
where Z(Wx) = Z 2 = {1, -1}- 

Proof. (1) We recall that the quaternion norm is multi- 
plicative and that ±l,±i, ±j,±k are the only Lipschitz 
quaternions of norm 1. Thus statement (1) is true for 
I = 0. 

We prove it for 1 = 1: More specifically, let q — a + bi+ 
cj + dk,a,b,c,d g Z and norm(q) = a 2 +b 2 + c 2 + d 2 = 5. 

Decompositions of 5 into sums of squares of four inte- 
gers are easily enumerated and we conclude that exactly 



two of the coefficients in the list {a, 6, c, d} are zero, ex- 
actly one is ±1, and exactly one is ±2. 

If a = ±1 then we observe that q is equal to one of 
1 ± 2i, 1 ± 2j, 1 ± 2k, -(1 ± 2i), -(1 ± 2j), -(1 ± 2k) and 
thus belongs to W. 

If one of 6, c, d is ±1 we reduce the proof to the previous 
observation by multiplying q times one of i, j, k. 

For example, if c = ±1 , then the real part of — jq is 
equal to c = ±1. 

Consider now a quaternion q with norm(q) = 5 , 1 > 1. 

Let q — p\ ...p m be a prime quaternion factorization 
of q. Since 5' = norm(q) = norm(pi) ...norm(p m ) , for 
each i = 1, ...m the norm(pi) is either 5 or 1. As we have 
shown above (considering I = 0, 1), in either case pi £ W . 

(2) Effective homomorphism h of W\ onto the W- 
circuits is the multiplicative completion of the following 
map: 

i-> iX 
j-MY 
k -> iZ 
(l±2i)/v / 5^- {l±2iX)/Vh 
(l±2j)/v / 5^- (l±2iY)/v / 5 
(l±2k)/V5-> {\±2iZ)/Vb. 

The correctness of this definition of homomorphism 
h is verified by direct comparison of multiplica- 
tive relations between the generators of W\ and 
g{W) = {iX,iY,iZ,(l±2iX)/V5,{l±2iY)/^/t,(l± 
2iZ)/y/E}. These relat ions happen to be identical. 

h is an epimorphism since all of the generators g(W) 
of the W-circuits group are by design in its image. 

The characterization of Ker(h) is derived from repre- 
sentation of quaternions as orthogonal rotations of the 
3-dimcnsional Euclidean space. 

Let us arbitrarily map the units i,j,k into vectors of 
an orthonormal basis in the Euclidean space and let us 
label the corresponding basis vectors e(i), e(j), e(k). For 
a quaternion with zero real part p = &i + cj + rfkwe 
write e(p) = b * e(i) + c * e(j) + d * e(k). 

Let Hi be the group of quaternions of norm 1 and g : 
Hi —> SO(3) be the representation defined as g(q)[e(b)] = 
e(q * b * q~ l ). 

It is known that g{q) is an orthogonal rotation; g 
is a representation of the group of quaternions of norm 
1 and that the kernel of this representation is the cyclic 
group Z 2 = {1, -1}. 

The group of quantum gates PSU(2) also has a stan- 
dard orthogonal representation stemming from its ad- 
joint representation on the Lie algebra psu(2) = su(2) = 
so(3). More specifically if psu(2) is regarded as the 
algebra of zero-trace Hermitian matrices then ad : 
PSU(2) —> Aut(psu(2)), where Aut is the automorphism, 
is ad{u)[m] = umu^ 1 . 

The adjoint representation of PSU (2) is faithful. 

If we regard the above homomorphism h as the homo- 
morphism h : Wi — > PSU(2) then it is immediate that 
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adh = g on W\. Since ad is faithful, i.e., injective, the 
kernel of h coincides with Ker(g) = Z{W\) = (Z)2 = 
{-1,1}. □ 

Lipschitz quaternions form a division ring, and in view 
of Thm [5J a quaternion with norm equal to 5' can be 
decomposed into a product of generators in Eq [3] in I 
trial division steps. 

The decomposition subalgorithm (Algorithm [lj is thus 
as follows, with input being a Lipschitz quaternion q of 
norm 5': 



Algorithm 1 Decomposition Subalgorithm 

Input: A quaternion q with norm 5 l 
1: ret <— empty list 
2: while norm(q) > do 

3: find d in {1 ± 2i, 1 ± 2j, 1 ± 2k} such that d divides q 
4: ret <— {d} + ret 
5: q <— q/d //divides norm(q) by 5 

6: end while 
7: if q =fi 1 then 
8: ret <S— q + ret 
9: end if 
10: return ret 



Now, given a unitary U as described in Thm [T] we 
associate with it the quaternion q = a + hi + cj + dh 
that has norm 5 L and thus belongs to the subgroup W. 
It is easy to translate the factorization of q in the basis 
given in Eq [3] into a factorization of U in the W basis. 
Thus, the approximation of a target unitary gate G by a 
W circuit is constructively reduced to approximating G 
with a unitary U as described in Thm [T] 

IV. RANDOMIZED APPROXIMATION 
ALGORITHM 

In this section, we present an algorithm for decompos- 
ing single-qubit unitaries into a circuit in the set (C + V), 
where C is the set of single-qubit Clifford gates and V is 
one of the V gates. The expected polynomial runtime is 
based on a conjecture, for which we have developed ample 
empirical evidence (based on computer simulation). We 
first present the conjecture and relevant number theory 
background, and then present the compilation algorithm. 

A. Number Theory Background 

Let A be a large positive integer, and A be a relatively 
small fixed offset value. Let x, y be standard coordinates 
on a 2-dimensional Euclidean plane. 

We introduce the circumference 

C(N, A) = {(x, y)\x 2 +y 2 = (Vn - A) 2 }. 
Let R(N, A) be the circular ring of width A defined as 
R(N,A) = {(x,y) I (%/A — A) 2 < x 2 +y 2 < A}. 




FIG. 2. The ring R(N, A) (yellow) and the segment 
A(N, A, P+) (blue), illustrated for values N = 625 and A = 4. 

Consider a tangent straight line at any point on the 
circumference C(N, A). The line divides the plane into 
two half-planes and let P+ be the half-plane that does 
not contain the origin. 

Next, define the circular segment 

A(A,A,P+) = i?(A,A)f|P + . 

The ring R(N, A) and the circular segment 
A(N,A,P + ) are shown schematically in Figure [2] 

We are concerned here with the segments of the stan- 
dard integer grid that are contained in R(N, A) and 
A (N, A, P+), and their asymptotic behavior when A — > 
00. 

We note that the Euclidean area A of R(N, A) is 

A(R(N, A)) = 2tt AVA + 0(A 2 ) 

and the Euclidean area of A (A, A, P + ) is 

A(A(N, A, P+)) = 4/3 + 0(A 5 / 2 A" 1 / 4 ). 

Estimation of the number of integer grid points inside 
a flat contour is a known open problem with a rich his- 
tory [12]. For our purposes, it suffices to know that the 
number of integer grid points 

{x,y€Z,(x,y) e R(N,A)} 

is asymptotically equal to 0(AV / A) and that the number 
of integer grid points 

{x : yeZ,(x,y) G A(A,A,P+)} 

is asymptotically equal to 9(A 3 / 2 A 1 / 4 ). These claims 
can be proven by elementary geometric means. 
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Finally, we assume that N — p L where p is a fixed 
integer prime number with p = 1 mod 4 and L is a large 
integer. 

Consider the set 

s 4 {N) = {(x, y, z, w) £ Z 4 | x 2 + y 2 + z 2 + w 2 = N} 

of all representations of N as a sum of squares of four 
integers. For N = p L , the cardinality of the set is 



card{s A (N)) = 8(p 



L+l 



l)/(p-l) = 0(JV). 



This is an immediate consequence of the formula express- 
ing |s4,(iV) | as 8 times the sum of divisors of N (see [13]') . 

The projection of Si(N) on the (x, y) plane is contained 
in the circle of radius yN and the projection of each point 
is an integer grid point in that circle. The converse is not 
true: for N = p L there are roughly 2 (p L+1 — l)/(p — 1) 
projection points in the circle, while there are roughly 
irp L integer grid points (see [H]). 

To determine the complexity of our first algorithm, 
we require a conjecture that states, informally, that the 
density of the (x, y)-projection points of Si(N) in the 
ring R(N,A) and the segment A(N, A, P + ) is the same 
as the density of these projection points in the entire 
circle of radius \/iVr] The conjecture is motivated by 
the Corollary from Theorem 1 in Ref. [T5] Although the 
conjecture is presented for a general prime p = 1 mod 4, 
our algorithms are developed for p = 5, thus we require 
it to be true only for p = 5. 

Conjecture 1. Consider N = p L , where p is a fixed in- 
teger prime number with p = 1 mod 4 and L is a large 
even integer. For a constant A > 1, let the four-square 
decomposition set Si(N), the geometric ring R(N.A), 
and the circular segment A(N, A, P + ) be defined as above. 
Let Pr Xt y(s4(N)) be the projection of the Si(N) onto its 
first two coordinates. Then 



(1) card (Pr x , y ( Si (N)) f| R(N, A)) = 9 [p L/2 /L 

(2) card(Pr x ^ y ( S4 (N))f]A(N,A,P + )) = 9 (p L/i /L 



We conclude this subsection with number theory and 
experiments that support the conjecture. Define the set 

sn(N : A) = {a 2 + b 2 \ a, b £ Z, (a, b) £ R(N, A)}. 

It is easy to see that 

sn(N,A)c [p L - 2Ap L/2 ,p L ]. 



2 To show that the algorithm requires expected polynomial time, a 
weaker form of Conjecture[l]may be considered; the weaker claim 
is that the density of the projection points in the ring and the 
segment is at most polylogarithmically lower than their density 
in the circle. 



If 2Ap L / 2 < p L , then the conditions of Thm 1 in Ref.H51 
are satisfied and the Corollary implies that the cardinal- 
ity of the set is 



( P L/2 
card(sn(N,A)) = el 



9 



p L ' 2 



Thus, there are as many distinct circumferences in 
R(N, A) that contain integer grid points, implying that 
the number of integer grid points on any one of these 
circumferences is Q {L 1 / 2 ) on average. 
We further note that the set 



v(L) = {p L - a 2 



a,beZ, (a,b) G R(N, A)} 



has cardinality 



m = card (v(L)) = 9 



Values from v{L) are contained in the interval 
[0, 2 Ap L l 2 \. The average density of integers in that seg- 
ment that are representable as a sum of two squares 
of integers is Q(y/log(N)) = Q(VL) Pi]. Assuming 
that the density of such integers across the set v(L) is 
the same, we infer from the assumption that there are 
m/y/L = <d(p L / 2 /L) values in v(L) that are so repre- 
sentable, and hence at least as many integer grid points 
(a, b) £ R(N, A) that are projections of some four square 
decomposition of p L ( i.e., such that there exist c, d £ 7L 



with p 



d 2 ). 



To verify the statement (1) of Conjecture [I] we ran 
extensive computer simulations for p — 5 and L — 
{16, 28}, and for p = 13 and L = {12, 18}, using 
Mathematica infinite precision integer arithmetic, and 
observed behavior consistent with the conjecture. To mo- 
tivate statement (2) of Conjecture [I] we tested the polar 
angles of points in Pr xy {s^{N)) for uniformity. The sim- 
ulation covered N = 5 16 , 5 28 , 13 12 , 13 18 and tested 
the null hypothesis that the distribution of the polar an- 
gles is uniform. Based on Kolmogorov-Smirnov statistics, 
the null hypothesis could not be rejected at any mean- 
ingful level of significance. 



B. The Algorithm 

We now present the expected-polynomial time algo- 
rithm. We begin by approximating an arbitrary Z- 
rotation with a (C + V) circuit. 

Problem 1. Given a Z-rotation G — Rz{&) and a small 
enough^ target precision e, synthesize a (C + V) circuit 
c(G, e) such that 



dist(c(G,e),G) < e 



(4) 



3 Although we do not have a closed form bound on how small 
should be, our algorithm works well in practice for e < 2* 5 -4 = 
0.0032. 
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and 



V r c (c(G,e)))<4 1og 5 (2/e)). 



(5) 



Theorem 3. There exists a randomized algorithm that 
solves Problem^in expected time polynomial in log(l/e). 

We first present geometry that relates the theorem to 
the Conjecture [T] with p = 5. Our goal is to select the 
target circuit depth value L such that 



e<2*5- L / 4 . 



(6) 



Having found the smallest integer L satisfying Eq 
we then represent G as G — cos (|) I + i sin (|) Z and 
consider approximating it with 

U = {a I + biX + ciY + diZ)5~ L/2 , 

as suggested by Thm[l] 

Approximating G to precision e in the trace distance 
metric is equivalent to finding U such that 



a cos 



h- L ' 2 > 1 



For convenience we note that, without loss of general- 
ity, it suffices to prove the theorem for —ir/2 < 9 < it/2 
since we can always rotate the target gate to a position 
within this interval using iiz(±7r/2) rotations from the 
Clifford group. We also note that our selection of L en- 
sures that 5 i/4 e - 2. 

Denote by A e (9) the segment of the unit disk where 
(xcos(|) +2/sin(|)) > 1 — e 2 . Let D(L) be an isotropic 
dilation of the plane with coefficient 5 L / 2 . Then the area 
of D(L)[A e (6)] is 

A(D(L)[M9)])=5 L4 ^e s ~ 8^5^. 
Define the angle (f> = v / 2e(l — e 2 /4) and the interval 



I w (e,6) 



with subinterval (5 L/2 sin (f - e) , 5 i/2 sin (§ + e)) . 

The length of the latter is approximately 2 * 
5 L / 2 cos (!) e > 2v / 2*5 i / 4 and it contains approximately 
at least as many integer values. 

Given any integer a such that 



5 L/2 



sin 



- e < a < 



5 L/2 



Sill 



we derive geometrically that the intersection of the hor- 
izontal line w = a with D(L)[A e (8)] is a straight line 
segment that is longer than 5 L / 2 ^- > 2 and that it con- 
tains at least two integer grid points. 
We are now ready to prove the theorem. 



Proof. Revisiting the notations of the previous subsec- 
tion, we introduce the set of all representations of 5 L as 
a sum of squares of four integers 

s 4 (5 L ) = {(x, y, z, w)eZ 4 |i 2 + y 2 + z 2 + w 2 = 5 L }. 

The key step in the algorithmic proof below is finding 
a point (a,d) in the intersection of Pr x y (s4,(5 )) and 
D(L)[A e (0)\- 

Once such a point is found we can use a Rabin-Shallit 



algorithm [17] to express 



a 2 - d 2 as b 2 + c 2 , b,c £ 



Z. Then U = (a I + bi X + ciY + di Z) 5~ L / 2 would be 
the desired approximation of G, that can be represented 
precisely as a W circuit in at most L quaternion division 
steps. 

Consider a horizontal line w — a, where a £ I w (e,9). 
By simple geometric calculation we find that the inter- 
section of this line with the D(L)[A e (8)] segment is a 
line segment that is at most 5 L/2 e 2 / cos (f ) < 5 L/2 y/2 e 2 
long. 

For our choice of L this maximum length is approxi- 
mately 4-^2 and thus the line segment contains at most 5 
points with integer first coordinate. On the other hand, 
we have shown earlier that for 



e < a < 5 



i/2, 



the intersection of the w — a line with D(L)[A e (9)) is a 
line segment that is longer than 2 and must contain at 
least 2 points with integer z coordinate. In other words, 
if a G I w (e,9) is a randomly selected integer, then with 
probability at least l/v2 the intersection segment con- 
tains at least 2 integer grid points. 

Algorithm [2] gives the randomized approximation algo- 
rithm. 

Algorithm 2 Randomized Approximation 
Input: Accuracy e, angle 6 
1: completion <— null 

Sw -s— set of all integers in I w (e, 6) 
while completion == null and Sw 7^ do 
Randomly, pick an integer a from Sw 
Sw ^— Sw — {a} 

for all integer d such that (d,a) £ D(L)[A e (0)] do 
if exist b, c G Z 

such that 5 L - a 2 - d 2 = b 2 + c 2 then 



completion <— (b, c) 
Break; 
end if 
end for 
end while 

if completion==null then 

return null; 
end if 

b first (completion) 
c last(completion) 

return U = (al + biX + ciY + diZ) 5~ L ^ 2 



In the worst case the algorithm terminates by exhaust- 
ing the 0(5 L / 4 ) candidate points in the D(L)[A e (9)] seg- 
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ment. However, we note that this segment is that of 
Conjecture [l] for p = 5. Therefore the share of satis- 
factory candidates among all of the integer grid points 
D{L)[A e {9)] is 9(1/ L). Thus the algorithm will termi- 
nate in 0{L) iterations on average. 

Since the average overall number of iterations is mod- 
erate, the largest cost in the algorithm is line [7J It has 
been shown by Rabin and Shallit [T7] that the effective 
test for an integer v to be a sum of squares of two inte- 
gers has expected running cost of O (log 2 (v) log(log(u))) . 
In our case v < 8 * 5 L / 2 and we estimate the expected 
cost of the step as O (L 2 log(i)). Therefore the over- 
all expected cost of the algorithm is O (L 3 log(L)) which 
translates into O (log(l/e) 3 log(log(l/e))). 

□ 



C. Experimental Results 

We have implemented Algorithm [2] from Thm [3] in 
Mathematica. Our implementation has the following 
simplifications: 

• Line[7]has been redefined to return PrimeQ[5 L — a 2 — 
d 2 ] for even a and d and to return false otherwise]^] 

• Given a desired V count V c , the algorithm termi- 
nates whenever a random candidate at distance less 
than 2 * 5~ v< =/ 4 from the target is picked. 

We implicitly used the Rabin primality test since it is 
in general faster than complete integer factorization. We 
ran our Mathematica solution over a set of 1000 ran- 
dom axial unitary rotations at 17 different circuit V c lev- 
els. The test statistics are presented in Figure [3] The 
solid blue line represents the interpolated average preci- 
sion achieved over the test set. The sizes of the markers 
are proportional to the standard deviations of the preci- 
sion at each level. The dashed red line shows the theo- 
retical precision bound of 2 * 5 _Vc / 4 . Note that the tight 
match between the theoretical estimate and experimental 
results is not very insightful since the algorithm has been 
designed to terminate as soon as the theoretical precision 
has been achieved. 

The algorithm can be used for approximate decompo- 
sition of any single-qubit unitary into a (C + V) circuit 
since any G £ ST/ (2) can be decomposed exactly into 
three axial rotations, and the algorithm can be applied 
to each axial component. The V count in this case will 
scale as 

K<121og 5 (2/e). (7) 



4 PrimeQ is Mathematica primality test that does not require com- 
plete factorization of the integer being tested. Mathematica is a 
registered trademark of Wolfram Research, Inc. 




■ Algo. Mean precision Theor. estimate 



FIG. 3. V count versus mean precision e (measured in trace 
distance). Results are presented for 1000 random axial ro- 
tations at 17 values of V count. Solid blue line: interpo- 
lated average precision. Dashed red line: theoretical bound 
on precision, 2 * 5~ v °/ 4 . Marker sizes are proportional to the 
standard deviations of the precision at each V count. 



For the majority of unitary gates, we can significantly 
reduce the depth of the output circuit with a correspond- 
ing increase in compilation time. The V count estimate 
given in Eq Q reflects the tripling of the circuit depth 
due to decomposition of the target unitary into three ax- 
ial rotations. An alternative approach would be to per- 
form a direct search in the four-dimensional integer grid; 
this will be the basis for our second algorithm described 
in Section El 



D. A Possible Generalization 

A foundation for efficient circuit synthesis over the V 
basis is the set of quaternions of norm 5 L and a body 
of number theory facts and conjectures related to that 
set. Given an integer prime p such that p = 1 mod 4, 
it is apparent that most of these facts and observations 
generalize to quaternions of norm p L , which are gener- 
ated by the primitive ones of norm p. Modulo Lipschitz 
units there are p + 1 such quaternions in the generator 
set. These correspond to a basis of p + 1 unitary opera- 
tors that we denote V(p). Together with the Pauli gates 
a subset of (p + l)/2 of the V(p) operators generate the 
generalization of the W circuits. 

However, in the case of p = 5, it was sufficient to add 
only one V operator in order to ensure the asymptotic 
uniformity of the grid of (C + {V}) circuits. For p > 5, 
additional independent V(p) operators are required. 

For example, when p = 13 the following gates are re- 
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quired, in addition to the Clifford group: 



Vi(13) = (27 + 3iZ)/V13, 
V 2 (13) = (I + 2i(X + Y + Z))/y/l3, 
V 3 (13) = (2I + iX + 2t(Y + Z))/Vl3, 
Vi(13) = (21 + iY + 2i{X + Z))/VU, 
V 5 (13) = (2/ + iZ + 2i(X + y))/\/l3 

A generalization of Thm [2] characterizes the gates rep- 
resentable exactly in the (C + {V(p}}) basis as normal- 
izations of Lipschitz quaternions of norm p L , L G Z. The 
exact synthesis of the corresponding circuit for a unitary 
of the form 



L/2 



U = (a I + biX + ciY + diZ)/p 



amounts to a generalization of Algorithm [T] and requires 
at most (p + 1) * L quaternion divisions. 

Thm|3]also generalizes to (C + {V(p)}) circuits, to the 
extent that Conjecture [I] holds for the prime parameter 
p, and the circuit depth estimate from the theorem gen- 
eralizes to an estimate of the form L < 41og p (2/e). 

We have chosen to focus on the V(5) case for two rea- 
sons. First, the basis requires only one non-Clifford gate 
for which we have a fault-tolerant implementation proto- 
col Second, we have so far only collected empirical data 
for p = 5. 



V. DIRECT SEARCH APPROXIMATION 
ALGORITHM 

In this section, we present an algorithm based on op- 
timized brute-force search for decomposing single-qubit 
unitaries into a circuit in the set (V + V), where V is 
the set of single-qubit Pauli gates and V is one of the V 
gates. We first present relevant background. 



A. Vicinity of a Unitary in PSU(2) as a Spherical 

Cap 

We begin by characterizing an e-neighborhood of a sin- 
gle qubit unitary as a "spherical cap" in a 3-dimensional 
sphere S 3 , i.e., as a portion of the sphere to one side of 
a certain 3-dimensional hypcrplane in the 4-dimensional 
Euclidean space. 

Consider the 4-dimensional Euclidean space with stan- 
dard coordinates a,/3,y,8. 

Let 

S 3 (R) = {(a,/3, 7 , S) | a 2 +(3 2 +1 2 + 5 2 = R 2 } 

be the 3-dimensional sphere of radius R centered at the 
origin. For any point on S 3 (R) we generate the unitary 

v{a,P,~t,5) = {al + ifiX + i-yY + iSZ)/R e SU{2). 



The quantum gate group PSU(2) is the central quo- 
tient of SU(2) with the exact sequence 1 — > Z 2 — >• 
SU(2) -> PSU(2) -> 1, therefore v defines a Z 2 covering 
of PSU (2) (which is the same factorization that is com- 
monly used to glue an S 3 into 3-dimensional projective 
space) . 

Under this covering the PSU (2) unitaries with nonzero 
trace are in one to one correspondence with the "north- 
ern" hemisphere 

S 3 + (R) = {(a,(3, 1 ,5)eS 3 (R)\a>0}. 

Thus given a gate G, |fr(G)| > 0, then a small enough 
e-vicinity of that gate 

c e (G) = {U E PSU{2)\ dist(U,G) < e} 

is unambiguously identified with a spherical cap in 
Sl(R). 

To clarify, consider G = v(a, (3, 7, S) and define 

C e (G) = {(a',p', 7 ',6')€Sl(R) \ 

a a' + 0/3' +77' + 56' > R(l - e 2 )}. 

Then G e (G) is a portion of S 3 (R) bounded by the hyper- 
plane 

a a' + /3 (3' + 7 7' + 6 6' = R( 1 - e 2 ) 

and v{C t {G))=c c {G). 

We will focus further calculations on the e- 
neighborhoods that do not contain zero-trace gates and 
thus correspond to spherical caps completely contained 
in S+(R). It is trivial to modify all of the equations to 
cases where an e-neighborhood intersects the zero-trace 
"equator" . 

Given a G e (G) that is completely contained in S+(R), 
it is easy to derive, geometrically, that the metric volume 
V of G £ (G) is 

/■cos -1 (e 7 ) 

V(G e (G)) =AixR 3 / sm 2 {r])dri 
Jo 

= 2irR 3 (cos- 1 ^) - i S in(2cos" 1 (e')) 

where e' = 1 — e 2 . 

Taking the Taylor series expansion of the latter at e = 
0, we find that 

v (0 «( G )) = 8 ^ + o (e =). 

In the next sections we focus on precision targets e for 
which the C e (G) neighborhoods have sufficient metric 
volume. 



B. A Bound for Uniform Precision 

We start by establishing that there exist unitary gates 
in PSU(2) that cannot be approximated by W^-circuits 
of V c < L to a precision better than = 5~ L//4 /2. This 
is based on the following observation: 
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Observation 1. Let w be a W -circuit different from the 
identity with V c (w) < L, then it evaluates to U(w) with 
\tr(U{w))\ < 2(1 - h- L ' 2 ) and U{w) is at least 5~ L / 4 
away from the identity. 

Indeed 

. . a i(bX + cY + dZ) 
U{w) = -j-j^I + — '-, a,b,c,d£Z. 



5 z/a- 



5 L/2 



Since U(w) is not the identity, \a\ cannot be greater than 
5 L / 2 -l. 

Now, let P G {I, X, Y, Z} be a Pauli gate. 

Observation 2. A circuit w with V c (w) < L and distinct 
from P evaluates to U(w) with a distance at least 5~ L / 4 
from P. 

Indeed, if w is a W^-circuit at a certain distance from 
P then w.P is a circuit with the same V count at the 
same distance from the identity. 

Thus, if e < e L = 5~ L / 4 /2 and G G PSU(2) is any 
unitary such that 

e < dist(G,P) < 2e L - e, 

then there are no VK-circuits of V c < L within distance e 
from G by the triangle inequality for dist: 

Vw, dist{w,G) > dist(w,P) - dist{G,P) > e. 

On the other hand, dist(G, P) is also greater than 
e. Therefore, the uniform precision guarantee cannot be 
better than 5 _L / 4 /2 for W^-circuits of V c < L. In other 
words, the uniform guarantee of optimal circuit depth 
cannot be better than 41og 5 (l/e) — 41og 5 (2). 

Revisiting the above discussions, we note that for 
e < e L — 5~ L / 4 /2 there exist "exclusion zones" of width 
2(el — e) around each of the Pauli gates consisting of 
unitaries that cannot be approximated to precision e by 
W-circuits with V c < L. Using the spherical cap volume 
formulae from the previous subsection, for e significantly 
smaller than ej,, we estimate the combined volume of 
these exclusion zones, relative to the volume of the 
as 0(5- L / 2 (5- L / 4 -3e)). 



C. A Working Conjecture 

Given the set of W^-circuits with V c < L, we will con- 
sider two key precision targets: e^L) = 2 * 5~ L / 4 and 
e 3 {L) = h- L '\ 

Consider the 3-dimensional hemisphere S+(5 L / 2 ). As 
per the results from the previous subsection, for the met- 
ric volumes of the £4- and £3- neighborhoods we have 

V(C £4(i) (G))~ 647ry25 3i / 4 /3 



Since the volume of S+(5 L / 2 ) is equal to 7r 2 5 3L / 2 , the 
relative metric share that these neighborhoods occupy on 
the hemisphere are 

V(C 64( l)(G)) 64^5-^ 
V(5^(5V2)) ~ 3^ 



and 



respectively. 



V(C £3{L) (G)) 8V25~ L 



and 



V(C £3(L) (G))~87ry25 L / 2 /3. 



Conjecture 2. (1) There exists a positive integer L4 
such that for any integer L > L4 and any single-qubit 
gate G there exists a W -circuit w such that 

dist(G,w) < e 4 (£). 

(2) For large enough integer L (L > L3J there exists 
an open subset G3 C PSU(2) with metric volume (1 — 
o(l))V(S+) (when L — > 00) such that for each G G G3 
there exists a W -circuit w with 

dist(G,w) < e 3 (L). 

The common motivation for both clauses of this con- 
jecture is that the number of distinct W-circuits scales 
as 5 Vc . More specifically, there are approximately 5 * 5 L 
distinct unitaries in PSU (2) that are represented exactly 
by W-circuits with V c < L. 

This stems from the fact that 5 L has exactly 10(5 L — 2) 
distinct decompositions into a sum of four squares of inte- 
gers, which can be easily derived from the Jacobi formula 
for the r4 function: 

r 4 (n) = 8 rf > SC(d) = (d\n)k{d mod 4^0) 

SC(d) 

(see chapters on the r(n) function in Ref . [T^|) . Geometri- 
cally, there are exactly 10(5 L — 2) distinct integer grid 
points on 5 3 (5 L / 2 ) and the set of such grid points is 
central-symmetrical with respect to the origin, so ap- 
proximately half of these integer grid points lie on the 
S\{h L / 2 ) piece of the hemisphere. 

Further intuition in support of the conjectures is drawn 
from [TJ |2], which investigates the distribution den- 
sity of the elements of the free group generated by 

(v 1 ,v 2 ,v 3 ,vr\v 2 -\v 3 - 1 ). 

A stronger special case of Conjecture[2]postulates that 
for any 

G = !/(<*,£, 7, 5) = (aI + ipX + i'yY + i8Z), 

where a 2 + f3 2 + j 2 + S 2 = 1, there is an integer grid 
point on S* 3 (5 L/2 ) within distance < 2 of (a, f3, 7, <S)*5 i/2 . 
Although we do not claim that this stronger statement is 
true for all unitaries G, the perceived near-uniformncss 
of the distribution of the integer lattice grid points over 
S 3 (5 L / 2 ) for large enough L makes it plausible for most 
unitaries. 
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D. Algorithm Outline 

Our algorithm to address Problem [2] below employs 
optimized direct search. 

Problem 2. Given an arbitrary single- qubit unitary G € 
PSU (2) and a small enough target precision e, synthesize 
a W circuit c(G, e) such that 



dist(c(G,e),G) < e 
and the V count of the resulting circuit is 

V c < 3 log 5 (l/e) 
for the majority of target unitaries and 

V c <4 log 5 (2/e) 

in edge cases. 



(8) 



(9) 



(10) 



Let L be the intended V count of the desired approxi- 
mation circuit. Given a target single-qubit unitary gate 
represented as G — al + fiiX + jiY + SiZ, in order to 
find integers (a, b, c, d) such that a 2 + b 2 + c 2 + d 2 = 5 L 
and 



dist(G, (al + biX + ciY + diZ)h~ 



-L/2N 



< e, 



(11) 



we split the a, f3, 7, <5 coordinates into two- variable blocks. 
Let us assume that the split is given by (a, S), 7). 
For the approximation inequality in Eq (11 1 to hold it is 
sufficient that 



(bh- L / 2 -pf + {ch- L ' 2 - 1 ) 2 



and 



(a5" 



-L/2 _ 



af + (d5- L / 2 - S) 2 <e 2 



(12) 



(13) 



Our goal is to achieve e 



-L/3 



It is easy to see 



that there are approximately 7r5 L / 3 integer pairs satisfy- 



ing each of the conditions in Eq ( 12 ) and Eq (131 for that 
e. We can now sweep over all of the (b, c) integer pairs 
and build a hash table of all of the 5 L — b 2 — c 2 differences 
occurring in the first set. Then we can sweep over all of 
the (a, d) integer pairs from the second set, in search of 
one for which a 2 + d 2 occurs in the hash table. 

Using number-theoretical considerations (see, for ex- 
ample, |17j). one can reduce the number of candidates 
considered in this direct search by a factor of approxi- 
mately — , LR (where LR is the Landau-Ramanuian 

2y> L ln(5) 

constant). Thus, for L = 34 the reduction factor is ap- 
proximately 0.05. 

For target unitaries that cannot be approximated to 
precision 5 -i / 3 , the algorithm iteratively triples the pre- 
cision goal (which has an effect of expanding the search 
space at each iteration) until the satisfactory candidate 
is found. 

The outline of the algorithm is given in Algorithm [3j 
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FIG. 4. V count versus mean precision e (measured by trace 
distance) of the approximation of 1000 random unitaries. The 
plot shows the 5 _t,//3 precision goal (dashed pink), the ex- 
perimental average (solid blue), and the worst cases (dotted 
green) . 



Algorithm 3 Direct Search Approximation 

Input: Accuracy e, Target gate G — al + j3iX + 'yiY + SiZ 



L <— L3*log 5 (lA)J 

hash «— Dictionary {Integer ; (Integer * Integer)) 
boundi «- 5 L (VoT 2 + 8 2 ± e) 2 



for all b, c £ Z satisfying Eq ( 12 \ do 



if bound— < 5 



c 2 - <bound+ and 5^ - b z 



is decomposable into two squares then 

Add (5 L - b 2 - c 2 , {b, c)) ^hash 
end if 
end for 

completion <— fail 

for all integer pairs (a, d) satisfying Eq (|13[ l) do 
if hash contains key equal to a 2 + d then 
completion <— (a, b, c, d) 
Break; 
end if 
end for 

if completion 7^ fail then 

completion <s— completion. (/, iX,iY,i Z)5~ L / 2 
end if 

return completion 



E. Experimental Results and Comparison 

The chart in Figure [4] presents the results of evaluat- 
ing our algorithm on a set of 1000 random unitaries. The 
vertical axis plots precision e on a logarithmic scale. The 
horizontal axis plots the maximum V count allowed in 
the approximating circuit. The dashed pink curve repre- 
sents the tight precision target of 5~ Vc ^ 3 . The solid blue 
curve represents the average approximation distance over 
the set of test unitaries; the error bars measure the stan- 
dard deviation around the average. The green dotted 
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e 


KA, Median 


Us, Median 


Db, Worst 


10" 3 


56.5 


13 


15 


RT 4 


73.5 


15.9 


18 


10 5 


91 


20.5 


22 


10" 6 


108 


24.6 


26 


lO" 7 


125 


28.95 


31 


10" 8 


142.5 


33.2 


35 


lO" 9 


159.5 


37.3 


39 



TABLE I. V counts for precision e for two V basis decom- 
position algorithms: Randomized Approximation (RA) (Al- 
gorithm [2f and Direct Search (DS) (Algorithm [3} , for 1000 
random non-axial rotations. Columns 2 and 3 list the median 
V count; Column 4 lists the V count for the worst case. 

curve plots the worst cases. For a small number of test 
unitaries, the algorithm could not find an approximating 
sequence with precision 5~^ 3 or better for V count v. 

In practice, experimental evidence suggests that this 
algorithm works well for the majority of non-axial uni- 
tary rotations. We have found that approximation cir- 
cuits obtained by Algorithm [2] are about 4 times deeper 
than the circuits produced by direct search using Algo- 
rithm [3j This factor primarily arises because the non- 
axial rotation is first broken into three axial components 
and then a more liberal precision of e^L) is pursued for 
each component. 

Table [T] compares the V count and precision values 
for Algorithms [2] [3j for precisions above 10~ 9 . The re- 
sults support the rough factor of 4 reduction in V count 



achieved by Algorithm[3] The improvement is also appar- 
ent from the plot shown in Figure [l] (green versus black 
curves) . 



VI. CONCLUSIONS AND FUTURE WORK 

In conclusion, we have proposed two novel algorithms 
for decomposing a single-qubit unitary into the V basis, 
an efficiently universal basis that has some advantages 
over decomposing into the (H, T) basis. Our algorithms 
produce efficient circuits that approximate a single-qubit 
unitary with high precision, and are computationally effi- 
cient in practice. Assuming a V gate has the same cost as 
a T gate, then our algorithms produce the shortest-depth 
approximation circuits known. 

A key direction for future research is to determine a 
low-cost, exact implementation of a V basis gate, which 
could include a native implementation on a given quan- 
tum computer architecture. Discovery of such techniques 
would enable us to execute quantum circuits in V basis 
at the quantum cost that is significantly lower than the 
cost of executing equivalent circuits in the (H, T) basis. 
It would also motivate further research into decomposi- 
tion into other basis sets. 
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Appendix A: V Gate Implementation 

Any V gate can be approximated using, for example, a 
(H, T) decomposition algorithm, but this is approximate 
and requires a sequence length of 70 or more, depending 
on the desired precision. In this appendix, we describe 
an exact implementation of the V gate using the protocol 
given in Ref. El For additional details on the protocol, 
we refer the reader to Ref. [Sj This method can be used 
to implement any of the V gates; here we show the im- 
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e~^HS t \H 2 ) — [x\-\^A\= \m) 

W> — t Z((-l) m 2ft,)|^> 



FIG. 5. Circuit to rotate by angle ±2#2 around the Z-axis. 
plementation for V3. 

1. Implementing V3 

Wc implement the V3 gate exactly, using a probabilis- 
tic circuit and a non-stabilizer resource state denoted as 
\H 2 ), where 

V 3 = {I + 2iZ)/Vb. 
In matrix form, this gate can be represented as: 

V* = 



1 


" 1 + 2i 


l + 2i 


' 1 


7H 


1 -2i 


V5 


.0 f >:,< 



Ignoring the global phase, we can solve for the angle of 
rotation about the Z axis using the following identity: 



JO 



= cos + i sin 9 = — 1 

5 5 

=> 6 = cos _1 ( — ) w 4.06889. 
5 



Consider the angle & = cos _1 (§) w 0.927295. This 
angle is n away from 9: 9 = 9' + tt . Thus, if we desire 
the rotation Z{9), we can implement the gate sequence 
Z{9) = Z(9')Z(tt), where Z(n) is the Pauli Z gate. 

Observe that 

0' = 29 2 + I 

where 202 is the angle resulting from using the resource 
state e~ iw / s HS^ \H 2 ), The f part of the angle is a 
T = Z(n/4) gate, thus Z{9') = Z(29 2 )T, and Z(9) = 
Z(29 2 )TZ. 

The circuit to obtain a rotation of Z(29 2 ) is given in 
Fig. [5j The circuit results in the application of ±29 2 to 
|^), each with equal probability. If m = 0, then Z(29 2 ) 
has been applied. If m = 1, we must apply Z{A9 2 ). Fur- 
ther details on the m = 1 case are given in Section |A 3| 



2. Obtaining an #2) Resource State 

To implement V3, we require a non-stabilizer state 
l-ffa)) which can be obtained using the ladder given in 
Ref. GO We begin by describing how to obtain the ladder 
state \H 2 ), and then describe how to implement V3 using 
this resource state. 



FIG. 6. Two-qubit circuit used to obtain new \Hi) states 
from initial resource states \Ho)- Upon measuring the (1) 
outcome, the output state is 



The circuit of Fig. [6] measures the parity of the two 
input qubits and decodes the resulting state into the sec- 
ond qubit. Let the two inputs be magic states \H) and 
define 9 a = | : 

\H) = \H ) = cos0 o |O) + sinfloll) • 
Upon application of the controlled-NOT gate A(X), 



A(X) 



> cos 2 6» 1 00) +sin 2 9 |01) 
+ cos 6i sin0 o (|ll) + |10)). 



Upon measurement m of the first qubit, we have 



m=o cos 
> 



9 |0)+sin 2 6» |1) 



COS" 1 



sm 



,1 1 
— >- 



V2 

We define 9\ such that 



(|0) + |1». 



cos -1 



cos 9 1 |0) +sin6»i |1 

from which we deduce 

cot 9 1 = cot 2 O • 

Thus we have 

IHx) = cos 0i |0) +sin( 



cos O |0) + sin O |1) 



sin 4 O 



1 



a non-stabilizer state obtained from \H) states, Clif- 
ford operations, and measurements. If the measure- 
ment outcome is 1, then we obtain a stabilizer state 
and discard the output (see Fig. [6]) . The measurement 
outcomes occur with respective probabilities p m =o,o = 
cos 4 O + sin 4 O = § and p m =i,o = 1 - Po = \- 

Now consider the next step of the ladder. We recurse 
on this protocol using the non-stabilizer states produced 
by the previous round of the protocol as input to the 
circuit in Fig. [6j In this case, we need only go to state 
I-H2), which is defined as 



\H 2 



10) 



ID 



where 



cot 09 = cot 



To obtain this state, we use as input the previously 
produced \Hi) state and a new \Hq) state: 

\H ) \Hi) ^% cos O cos 0i 1 00) + sin O sin 0i |01) 

+ sin O cos 0i 1 10) + cos O sin 0i 1 1 1) . 
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Upon measurement of the first qubit, we have 

-^=5> (cos0'|O)+sm0'|l)), 

^> (cos9" |0) +sin6»" |1)), where 
cot#' = cot 9i cot 8q = cot 3 6*o = cot 62, 
cot 9" = cot 9i tan 9q = cot 1 Oq = cot 9$. 

Thus, if we measure m = 0, we obtain the state |i?2) 
and if we measure m = 1, we obtain \Hq). The probabil- 
ity of measuring is given by 

p m= o,i = cos 2 6*i cos 2 6*o + sin 2 9\ sin 2 9q. 

Note that | < p m =o,i < cos 2 | = 0.853..., so the 
probability of obtaining \H2) is far higher than the prob- 
ability of obtaining \Hq). 

3. Resource Cost 

What is the cost of obtaining a \H2) state in terms of 
\Hq) resource states? We simulated 10 million instances 
of the ladder to determine the average cost of obtaining 
|-ffi) and 1 1/2). Recall that the probabilities of moving 
"up" the ladder are higher than moving "down" the lad- 
der. For l-ffi), the cost is on average 2.66 \Hq) states, 
with a median cost of 2. For |i?2), the cost is on average 
4.35 \Hq) states, with a median cost of 3. 

What is the cost of implementing Z(9')l Recall that 
our technique uses a probabilistic circuit with a success 



probability of 1/2. Thus, on average it will require two 
attempts for success. 

If the circuit succeeds, the cost in \H ) states is roughly 
5.35. If the circuit fails, then we must correct the circuit 
by applying a Z rotation of 2 * 29 2 . This requires prepar- 
ing a resource state ZJ462), which can be done using the 
circuit given in Fig. [H] with \ip) = e~"/ s HS^ \H 2 ). On 
average, two attempts will be required to prepare the 
state, resulting in an average cost of 4 \H2) states, or 
roughly 4 * 4.35 = 17.4. The prepared state is applied 
to the target qubit \ip) using the same circuit in Fig. [5] 
except now the top input qubit is \Z(M>2)). The total 
cost if the circuit succeeds on this second attempt, after 
the first failure, is 1 + 4.35 + 17.4 = 22.75. 

As can be seen, each attempt that fails requires prepa- 
ration of a more costly resource state for the next at- 
tempt. The series of attempts is a negative binomial of 
parameter p = | and the expected number of attempts to 
achieve success goes as ~ ~ = 2. In general, at attempt 

k, a resource state to perform rotation by angle 2 k *29 2 is 
required. The cost of preparing the resource state grows 
exponentially in k, and in the limit is infinite. However, 
in practice, we will only make 1-3 attempts, and upon 
the final failure, apply a different approximation tech- 
nique to the remaining rotation R using methods of, for 
example, Refs. [HE]. The optimal number of attempts to 
make before backing off to a different technique can be 
determined based on the required precision level (since 
the backoff method will only be approximate) and the 
chosen technique. 



5 We may in fact apply the backoff technique to the entire re- 
maining sequence, that is, by determining the unitary from the 



remaining sequence and approximating it with the backoff tech- 
nique. 



